Deconstructing Deep Learning + δeviations
Drop me an email
Format :
Date | Title
TL; DR
In this post we shall explore as many loss functions as I can find.
Loss functions are arguably one of the most important factors in a machine learning model. It gives the model an understanding of how well it did and basically allows it to learn. Simply put, it is the difference between the required result and the produced one. Quite obviously this is different in every place. For example in a Generative Adversarial Network (GANs), the loss function is the completely different. In WGAN, it is a distance metric called Wassertein distance. In a unet, the loss is the difference between the two images and so on and so forth.
Anyway let us explore everything we can about loss functions. I first made a list of all the loss functions offered by keras. It seems to be pretty comprehensive and I have not heard of many of them so far so lets see. Edit : Maybe this isnt a fully comprehensive list. But I will add to it if I find more later. I realized that most of these seem to just be small modifications on previous ones. And some are beyond my understanding right now. But I will come back to them when I get it. (I added a tiny list of those I dont understand yet at the bottom)
Since I am arbitrarily hooking together loss functions from every library I can find, if you feel something is wrong do let me know :) Also note that the examples used are not necessarily the ones that will be used while training and are random values used to test if the code is working
$$\log\left( \frac{e^{ŷ}}{\mathrm{sum}\left( e^{ŷ} \right)} \right)$$
logsoftmax(ŷ) = log.(exp.(ŷ)/sum(exp.(ŷ)))
$$\left( - \mathrm{sum}\left( y \cdot \mathrm{logsoftmax}\left( ŷ \right) \cdot weight \right) \right) \cdot \mathrm{//}\left( 1, \mathrm{size}\left( y, 2 \right) \right)$$
bcelogits(y,ŷ,weight) =-sum(y .* logsoftmax(ŷ) .* weight) * 1 // size(y, 2)
$$\frac{1}{\mathrm{length}\left( y \right)} \cdot \mathrm{sum}\left( \mathrm{max}\left( 0, \left( - y \right) \cdot x1 - x2 + margin \right) \right)$$
marginranking(x1,x2,y,margin=0.0) = (1/length(y))*sum(max.(0, -y.*(x1.-x2).+margin))
if $$\left( \left\|y - ŷ\right\| \lt 1.0 \right) >1 $$
$$\frac{1}{\mathrm{length}\left( y \right)} \cdot \mathrm{sum}\left( 0.5 \cdot \left( y - ŷ \right)^{2} \right)$$
else
$$\frac{1}{\mathrm{length}\left( y \right)} \cdot \mathrm{sum}\left( \left\|y - ŷ\right\| - 0.5 \right)$$
function huber(y,ŷ)
if count(x->x==0, all.(abs.(y.-ŷ).<1.0))>=1
return (1/length(y)).*sum(0.5 .*(y .- ŷ).^2)
else
return (1/length(y)).*sum(abs.(y .- ŷ).-0.5)
end
end
$$ - \mathrm{sum}\left( \log\left( y \right) \right)$$
nll(y) = -sum(log.(y))
nll(x,y) = -sum(log.(y))
$$\frac{1}{\mathrm{length}\left( y \right)} \cdot \mathrm{sum}\left( y \cdot \log\left( ŷ \right) + \log\left( 1 - y \right) \cdot \log\left( 1 - ŷ \right) \right)$$
bce(y,ŷ) = (1/length(y))*sum(y.*log.(ŷ).+(log.(1 .-y).*log.(1 .-ŷ)))
$$ - \mathrm{sum}\left( y \cdot \log\left( ŷ \right) \right)$$
cce(y,ŷ) = -sum(y.*log.(ŷ))
L2 norm is $$\sqrt{\mathrm{sum}\left( \left( \left\|x\right\| \right)^{2} \right)}$$
Cosine similarity is $$ - \mathrm{sum}\left( \mathrm{l2norm}\left( y \right) \cdot \mathrm{l2norm}\left( ŷ \right) \right)$$
l2_norm(x) = sqrt.(sum((abs.(x).^2)))
function cosinesimilarity(y,ŷ)
return -sum(l2_norm(y).*l2_norm(ŷ))
end
We first define xlogx for a weird edge case $$x \cdot \log\left( x \right)$$
Then entropy $$\mathrm{sum}\left( \mathrm{xlogx}\left( y \right) \right) \cdot \mathrm{//}\left( 1, \mathrm{size}\left( y, 2 \right) \right)$$
Then cce as defined before $$ - \mathrm{sum}\left( y \cdot \log\left( ŷ \right) \right)$$
Finally KLD $$entropy + crossentropyloss$$
function xlogx(x)
result = x * log(x)
ifelse(iszero(x), zero(result), result)
end
function kldivergence( y,ŷ)
entropy = sum(xlogx.(y)) * 1 //size(y,2)
cross_entropy = cce(ŷ, y)
return entropy + cross_entropy
end
We first define the softplus function $$\log\left( e^{x} + 1 \right)$$
Then , $$x = ŷ - y$$
logcosh = $$\mathrm{mean}\left( x + \mathrm{softplus}\left( -2 \cdot x \right) - \log\left( 2.0 \right) \right)$$
softplus(x) = log.(exp.(x).+1)
function logcosh(y,ŷ)
x = ŷ - y
return mean(x.+softplus(-2 .*x) .- log(2.))
end
There are two ways of doing this, mean and sum. For mean,
$$\frac{1}{\mathrm{length}\left( y \right)} \cdot \mathrm{sum}\left( \left\|y - ŷ\right\| \right)$$
For sum,
$$\mathrm{sum}\left( \left\|y - ŷ\right\| \right)$$
function mae(y,ŷ,reduction= "mean")
if reduction=="mean"
return (1/length(y))*sum(abs.(y .- ŷ))
elseif reduction=="sum"
return sum(abs.(y .- ŷ))
end
end
$$\frac{1}{\mathrm{length}\left( y \right)} \cdot \mathrm{sum}\left( \left\|\frac{y - ŷ}{y}\right\| \right)$$
mape(y,ŷ) = (1/length(y))*sum( abs.((y-ŷ)/y))
$$\frac{1}{\mathrm{length}\left( y \right)} \cdot \mathrm{sum}\left( \left( \log\left( y + 1 \right) - \log\left( ŷ + 1 \right) \right)^{2} \right)$$
msle(y,ŷ) = (1/length(y))*sum((log.(y.+1).-log.(ŷ.+1)).^2)
Mean $$\frac{1}{\mathrm{length}\left( y \right)} \cdot \mathrm{sum}\left( \left( y - ŷ \right)^{2} \right)$$
Sum $$\mathrm{sum}\left( \left( y - ŷ \right)^{2} \right)$$
function mse(y,ŷ,reduction= "mean")
if reduction=="mean"
return (1/length(y))*sum((y .- ŷ).^2 )
elseif reduction=="sum"
return sum((y .- ŷ).^2 )
end
end
Mean $$\sqrt{\frac{1}{2} \cdot \mathrm{sum}\left( \left( y - ŷ \right)^{2} \right)}$$
Sum $$\sqrt{\mathrm{sum}\left( \left( y - ŷ \right)^{2} \right)}$$
bcelogits(y,ŷ,weight) =-sum(y .* logsoftmax(ŷ) .* weight) * 1 // size(y, 2)
$$\frac{1}{\mathrm{length}\left( y \right)} \cdot \mathrm{sum}\left( ŷ - \log\left( ŷ \right) \right)$$
poisson(y,ŷ) = (1/length(y))*sum(ŷ.-log.(ŷ))
$$ - \mathrm{sum}\left( ŷ \cdot \log\left( ŷ \right) \right)$$
sparsece(y,ŷ) = -sum(ŷ.*log.(ŷ))
$$\mathrm{sum}\left( \left( \mathrm{max}\left( 0, 1 - y \cdot ŷ \right) \right)^{2} \right)$$
squaredhinge(y,ŷ) = sum(max.(0,1 .-(y.*ŷ)).^2)
We first find the positive distance $$pos{distance} = \left( anchor - positive \right)^{2} -1$$
Then the negative distance $$neg{distance} = \left( anchor - negative \right)^{2} -1$$
Then the temporary loss $$posdistance - negdistance + \alpha$$
And the final loss $$\mathrm{sum}\left( \mathrm{max}\left( loss_{1}, 0.0 \right) \right)$$
function tripletloss(anchor , positive, negative, α = 0.3)
pos_distance = (anchor.-positive).^2 .+ (-1)
neg_distance = (anchor.-negative).^2 .+ (-1)
loss_1 = (pos_distance.-neg_distance).+α
return sum(max.(loss_1, 0.0))
end
$$\mathrm{max}\left( 0, 1 + \mathrm{max}\left( w_{y} \cdot x - w_{t} \cdot x \right) \right)$$
hinge(x,w_y,w_t) = max.(0,1 .+ max.(w_y.*x .- w_t.*x))